Goto

Collaborating Authors

 seoulnationaluniversity seoul




Position-basedScaledGradientforModel QuantizationandPruning-Appendix

Neural Information Processing Systems

Inthis experiment, we only quantize the weights, not the activations, to compare the performance degradation as weight bit-width decreases. The mean squared errors (MSE) of the weights across different bit-widths are also reported. The name of the layer and the number of parameters in parenthesis are shown in the column. All numbers are results of the last epoch. Table A3: ResNet-32 trained with Adam on the CIFAR-100 dataset.


SupplementaryMaterialforLipschitz-Certifiable TrainingwithaTightOuterBound

Neural Information Processing Systems

We want to provep is a local minimum of(11), then since (11) is a convex optimization, we can prove that p is the global optimum. We consider a closed local areaB(p,δ > 0) such that for any q B(p,δ), q 0 and we can ignore the box constraint forql for l Jc. We call a local optimal solution of(11) in B(p,δ) as p . Moreover, if kp k < 1, then we can further extendp [Jc] to produce a larger inner product withv, and this contradicts the assumption. After propagating a ballB2(µ,ρ) through a ReLU layer, we can estimate the propagated outer bound with anew ballB2(µ+,ρ)whereµ+ = max(µ,0). However, the true image ReLU(B2(µ,ρ)) has no negative elements.